Netspeak - Assisting Writers in Choosing Words
نویسندگان
چکیده
NETSPEAK is a Web service which helps writers in finding alternative expressions for what they want to say.1 It provides a large index of writing samples in the form of ngrams, n ≤ 5, along with an efficient means to retrieve them by the use of wildcard queries. When in doubt about a phrasing, a user can get additional evidence by retrieving samples that match a given context. The figure below shows the results for a query where a user is interested in the two most frequently written words between “looks” and “me”. The first two columns give an idea about the customariness of each result, and the user can select the one most appropriate for her sentence. To provide a rich choice of writing samples we index the Google n-gram corpus which was compiled from a large portion of the English Web and which consists of more than 3 billion n-grams along with their occurrence frequencies [2]. We have developed a space-optimal inverted index based on minimal perfect hashing. The hash function maps the vocabulary V of the corpus to the storage positions of postlists. A hash function is perfect if it does not produce hash collisions for the key set V , and it is minimal if the number of storage positions required does not exceed |V |. The hash function is constructed with the CHD algorithm which produces a space overhead of 2.07 × |V | bits [1]. Moreover, the index provides a top-k retrieval strategy to find the n-grams matching a query; details can be found in [3]. The table below shows selected performance data of our index. NETSPEAK is currently deployed on a cluster of 15 computers. In a load test the service was measured to process about 10 000 queries per second.
منابع مشابه
Choosing words in computer-generated weather forecasts
One of the main challenges in automatically generating textual weather forecasts is choosing appropriate English words to communicate numeric weather data. A corpus-based analysis of how humans write forecasts showed that there were major differences in how individual writers performed this task, that is, in how they translated data into words. These differences included both different preferen...
متن کاملA Web-based Application for Writing Novels
In this paper, we propose a method for assisting amateur writers in novel writing. Amateur writers can publish their work intensively through web infrastructures. This situation is beneficial, because it encourages amateur writers to enhance their skills by sharing their work. However, writing a good novel is difficult for a novice, because the novel-writing task requires the management of many...
متن کاملRetrieving Customary Web Language to Assist Writers
This paper introduces NETSPEAK, a Web service which assists writers in finding adequate expressions. To provide statistically relevant suggestions, the service indexes more than 1.8 billion n-grams, n ≤ 5, along with their occurrence frequencies on the Web. If in doubt about a wording, a user can specify a query that has wildcards inserted at those positions where she feels uncertain. Queries d...
متن کاملProceedings of the seventh Web as Corpus Workshop ( WAC 7 )
We will discuss backgrounds, technology, and applications developed in the Webis Research Group, whereas the talk’s common thread is the exploitation of the web as a corpus. Three different applications will reveal different rationales and possibilities when operationalizing text reuse and language reuse on a large scale. 1. The Netspeak word search engine reuses the web as a corpus of writing ...
متن کاملDiscourse Community Collocations and L2 Writing Content
Taking the position that writing can be an important skill to foster knowledge building pedagogy, this article explores vocabulary as a supportive tool for this purpose. Having this in mind, a compilation of conceptually loaded vocabularies pertaining to seven discourse communities was developed, two of which were given to a group of L2 writers to investigate the implications of phraseology for...
متن کامل